[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

mitruska · 2025-09-23T06:00:50Z

Details:

This transformation is for compile time and is not enabled by default, it should be enabled in each plugin with MOE plugin support.
Example registration of the fusion transformation for CPU plugin: 41145cf

Fuse vectorized MatMul experts into MOE for 3GEMMs and 2GEMMs pattern:

class ov::pass::VectorizedExpertsFusion : public ov::pass::GraphRewrite {
public:
    OPENVINO_GRAPH_REWRITE_RTTI("VectorizedExpertsFusion");
    VectorizedExpertsFusion() {
        add_matcher<ov::pass::FuseVectorizedMOE2GEMM>();
        add_matcher<ov::pass::FuseVectorizedMOE3GEMM>();
    }
};

Add internal MOE op

MOE internal op spec PR:

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

Preliminary requirements (offline transformations):

Patterns match MatMul (transpose_a=False, transpose_b=True), for batched MatMuls preliminary update of MatMulConstTransposesExtraction is needed:
- [Transformations] Update MatMulConstTransposesExtraction pattern to match batched MatMul #32378
Fusion of separate MatMul experts into vectorized (batched) MatMul:
- [MOE] Fuse expert subgraph weights in offline transformation pass #32199

Tickets:

transformation (and fusion details): 173663, op: 171913

…_experts_fuse

rkazants · 2025-10-02T06:15:28Z

src/core/src/op/moe.cpp

+    OV_OP_SCOPE(internal_MOE_validate_and_infer_types);
+    // TODO: Add inputs validation
+
+    set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));


we can also use shape of weights for dimension size deduction if some of them is unknown in input hidden_state

yeonbok · 2025-10-08T20:31:44Z

When you merge this transform, please disable the transform in GPU plugin until we support the moe subgraph, to prevent crash using the default packages.
Once we support, we'll turn on it.

yeonbok · 2025-10-08T21:41:03Z

src/core/dev_api/openvino/op/moe.hpp

+    ///      (input to final multiplication)
+    ///   2: router_topk_output_indices - [..., topk] indices of selected top-k experts
+    ///   3: w0_weight - expert weights for first projection, shape [num_experts, inter_size, hidden_size] or
+    ///   [num_experts, hidden_size, 2 * inter_size] if fused


I think [num_experts, hidden_size, 2 * inter_size] <= this will be transposed to [num_experts, 2*inter_size, hidden_size].
If three is the case when the weights are not transposed, we'll need to have a flag whether the weight is transposed or not.
Or, If we can assume that the fused MoE has the weight transposed always, it would be best.

As we discussed, it will be adjusted to reflect MatMul(transpose_a=False, transpose_b=True),
related PR:

[Transformations] Update MatMulConstTransposesExtraction pattern to match batched MatMul #32378

…_experts_fuse

mitruska · 2025-10-16T08:04:46Z

When you merge this transform, please disable the transform in GPU plugin until we support the moe subgraph, to prevent crash using the default packages. Once we support, we'll turn on it.

This transformation is not enabled by default, it should be enabled in each plugin with MOE plugin support.
Example registration of the fusion transformation for CPU plugin: 41145cf

### Details: In this PR we introduce yet another operation "GatherMatmu", which essentially does gemv operations over the current tokens and the active experts. As the first step, we perform gemv operation using the dnnl::inner_product. But obviously this solution is suboptimal, as it doesn't give a fine grain control over parallelization, and in the case of many tokens being processed by a specific expert (prefill), having gemm operation may be more optimal as the tokens may be batched and we can do SIMD level parallelization by tokens as well. Also this PR contains all the essential transformations that allow to enable a few common MoE patterns. MoE pattern matcher is based on #32183 Related oneDNN fork PR: openvinotoolkit/oneDNN#292 ### Tickets: - CVS-171910 --------- Co-authored-by: Vladislav Golubev <[email protected]>

mitruska added 6 commits September 22, 2025 21:05

Vectorized MOE fusion init

dd3da1e

MOE op init

7b9572e

MOE attrs/inputs adjust

4a50118

Adjust inputs desc

7e3230e

Add adapters for expert_type enum

104246a

Merge remote-tracking branch 'upstream/master' into mitruska/moe_vect…

62e054b

…_experts_fuse

mitruska self-assigned this Sep 23, 2025

github-actions bot added category: Core OpenVINO Core (aka ngraph) category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Sep 23, 2025

mitruska requested review from mmikolajcz and yeonbok September 23, 2025 06:01

mitruska added 6 commits September 23, 2025 23:48

Fuse Multiply output before Reshape

6df368c

MOE fusion unit test

c6448d3

Add missing header

f5c1c41

Move MOE op to internal

762bc9a

Apply MOE transformation for CPU

41145cf

Revert CPIU transformation pipeline change

5a34684

github-actions bot added category: CPU OpenVINO CPU plugin and removed category: CPP API OpenVINO CPP API bindings labels Sep 24, 2025

mitruska changed the title ~~[Transformations][MOE] Fuse vectorized MatMul experts into MOE~~ [Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE Sep 24, 2025

mitruska marked this pull request as ready for review September 24, 2025 08:50

mitruska requested review from a team as code owners September 24, 2025 08:50

mitruska requested review from CuriousPanCake and removed request for a team September 24, 2025 08:50

maxnick self-assigned this Sep 24, 2025

Fix cast warning

b46f960

mitruska added 6 commits September 24, 2025 15:07

Remove OPENVINO_API macros

e343cd8

Update input desc

0406105

No keep dims in Reduce

eaede0d

Add transpose attrs to MatMul patterns

9ce1569

Switch beta with alpha to match the beta for swish naming

90f31a2

Merge remote-tracking branch 'upstream/master' into mitruska/moe_vect…

9cd0170

…_experts_fuse

mitruska mentioned this pull request Sep 30, 2025

[Spec][MOE][Internal Op] Specification of MOE internal operation #32255

Open

Merge branch 'master' into mitruska/moe_vect_experts_fuse

6c9eb7d

rkazants reviewed Oct 2, 2025

View reviewed changes

Add fusion transformation for the second expert_type (GEMM3)

3a96855

yeonbok reviewed Oct 8, 2025

View reviewed changes

mlukasze mentioned this pull request Oct 13, 2025

[Feature Request]: Support gpt-oss-120b and 20b #32059

Open

1 task

mitruska added 3 commits October 13, 2025 10:04

Update GEMM3 transpose_b attr to be true

df97c22

Merge remote-tracking branch 'upstream/master' into mitruska/moe_vect…

6e17476

…_experts_fuse

Update GEMM2 pattern to match MatMul transpose_b=True

8be2257

mitruska added this to the 2025.4 milestone Oct 16, 2025

mmikolajcz approved these changes Oct 16, 2025

View reviewed changes

mlukasze requested review from rkazants and yeonbok October 16, 2025 11:52

mryzhov self-assigned this Oct 16, 2025

mryzhov approved these changes Oct 16, 2025

View reviewed changes

maxnick mentioned this pull request Oct 16, 2025

[CPU] Introduce GatherMatmul operation to optimize MoE pattern #32450

Merged

mlukasze added this pull request to the merge queue Oct 16, 2025

yeonbok approved these changes Oct 16, 2025

View reviewed changes

Merged via the queue into openvinotoolkit:master with commit a364bf5 Oct 16, 2025
207 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

Uh oh!

mitruska commented Sep 23, 2025 •

edited

Loading

Uh oh!

rkazants Oct 2, 2025

Uh oh!

yeonbok commented Oct 8, 2025

Uh oh!

yeonbok Oct 8, 2025

Uh oh!

mitruska Oct 13, 2025

Uh oh!

mitruska commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

[Transformations][MOE] Add MOE internal op and fuse vectorized MatMul experts into MOE #32183

Uh oh!

Conversation

mitruska commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Preliminary requirements (offline transformations):

Tickets:

Uh oh!

rkazants Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

yeonbok commented Oct 8, 2025

Uh oh!

yeonbok Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

mitruska Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

mitruska commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mitruska commented Sep 23, 2025 •

edited

Loading